Machine Learning Analysis Report

Generated on August 03, 2025 at 09:12 PM

Machine Learning Analysis Pipeline

EDR: Dataset Loading & Preprocessing

EDR – Train/Test Overview
• Train shape: (9561, 20) | Test shape: (818, 20)
• Total train samples: 9,561 | Total test samples: 818
• Number of features: 18
• Target column: 'label'
• Missing values (train): 0 | (test): 0
EDR – Train Class Distribution
• 0: 8,704
• 1: 857
• Class balance (minority/majority): 9.8460%
EDR – Feature Preparation
• Target encoding: {0: 0, 1: 1}
• Data preprocessing: Infinite values handled, missing values filled with train medians
• Feature scaling: StandardScaler (fit on train, applied to test)
Baseline (Most-Frequent) Accuracy: 0.9095

EDR: Model Performance Comparison

EDR – Model Performance Metrics

ModelAccuracyBalanced AccPrecisionRecallF1ROC-AUCPR-AUC
Logistic Regression0.89240.58190.34090.20270.25420.61430.2095
Random Forest (SMOTE)0.90100.59270.41030.21620.28320.74180.2957
LightGBM0.89000.61700.36210.28380.31820.82530.3332
Balanced RF0.86310.68740.32410.47300.38460.83450.3140
SGD SVM0.88510.57170.29170.18920.2295nannan
IsolationForest0.39000.48210.08580.59460.1499nannan

Confusion Matrix Analysis

ModelTNFPFNTPFP RateMiss Rate
Logistic Regression7152959153.90%79.73%
Random Forest (SMOTE)7212358163.09%78.38%
LightGBM7073753214.97%71.62%
Balanced RF6717339359.81%52.70%
SGD SVM7103460144.57%81.08%
IsolationForest275469304463.04%40.54%

Best Models by Metric

Accuracy
Random Forest (SMOTE)
0.9010
Balanced Acc
Balanced RF
0.6874
Precision
Random Forest (SMOTE)
0.4103
Recall
IsolationForest
0.5946
F1
Balanced RF
0.3846
ROC-AUC
Balanced RF
0.8345
PR-AUC
LightGBM
0.3332
Lowest False Positive Rate
Random Forest (SMOTE)
3.09%
Lowest Miss Rate
IsolationForest
40.54%

EDR – Metrics by Model

EDR – Metrics by Model

EDR – ROC Curves

EDR – ROC Curves

EDR – Precision–Recall Curves

EDR – Precision–Recall Curves

EDR – Predicted Probability Distributions

EDR – Predicted Probability Distributions

EDR – Threshold Sweep

EDR – Threshold Sweep

EDR: Logistic Regression – Detailed Analysis

EDR – Logistic Regression: Confusion Matrix

EDR – Logistic Regression: Confusion Matrix

EDR – Logistic Regression: Confusion Matrix

EDR – Logistic Regression: Classification Report

Modelprecisionrecallf1support
00.92380.96100.9420744.0000
10.34090.20270.254274.0000
accuracynannan0.8924818.0000

EDR – Logistic Regression: Feature Importance

EDR – Logistic Regression: Feature Importance

EDR – Logistic Regression: Feature Importance

EDR: Random Forest (SMOTE) – Detailed Analysis

EDR – Random Forest (SMOTE): Confusion Matrix

EDR – Random Forest (SMOTE): Confusion Matrix

EDR – Random Forest (SMOTE): Confusion Matrix

EDR – Random Forest (SMOTE): Classification Report

Modelprecisionrecallf1support
00.92550.96910.9468744.0000
10.41030.21620.283274.0000
accuracynannan0.9010818.0000

EDR – Random Forest (SMOTE): Feature Importance

EDR – Random Forest (SMOTE): Feature Importance

EDR – Random Forest (SMOTE): Feature Importance

EDR: LightGBM – Detailed Analysis

EDR – LightGBM: Confusion Matrix

EDR – LightGBM: Confusion Matrix

EDR – LightGBM: Confusion Matrix

EDR – LightGBM: Classification Report

Modelprecisionrecallf1support
00.93030.95030.9402744.0000
10.36210.28380.318274.0000
accuracynannan0.8900818.0000

EDR – LightGBM: Feature Importance

EDR – LightGBM: Feature Importance

EDR – LightGBM: Feature Importance

EDR: Balanced RF – Detailed Analysis

EDR – Balanced RF: Confusion Matrix

EDR – Balanced RF: Confusion Matrix

EDR – Balanced RF: Confusion Matrix

EDR – Balanced RF: Classification Report

Modelprecisionrecallf1support
00.94510.90190.9230744.0000
10.32410.47300.384674.0000
accuracynannan0.8631818.0000

EDR – Balanced RF: Feature Importance

EDR – Balanced RF: Feature Importance

EDR – Balanced RF: Feature Importance

EDR: SGD SVM – Detailed Analysis

EDR – SGD SVM: Confusion Matrix

EDR – SGD SVM: Confusion Matrix

EDR – SGD SVM: Confusion Matrix

EDR – SGD SVM: Classification Report

Modelprecisionrecallf1support
00.92210.95430.9379744.0000
10.29170.18920.229574.0000
accuracynannan0.8851818.0000

EDR – SGD SVM: Feature Importance

EDR – SGD SVM: Feature Importance

EDR – SGD SVM: Feature Importance

EDR: IsolationForest – Detailed Analysis

EDR – IsolationForest: Confusion Matrix

EDR – IsolationForest: Confusion Matrix

EDR – IsolationForest: Confusion Matrix

EDR – IsolationForest: Classification Report

Modelprecisionrecallf1support
00.90160.36960.5243744.0000
10.08580.59460.149974.0000
accuracynannan0.3900818.0000

EDR – IsolationForest: Feature Importance

Feature importance not available for this model type.

XDR: Dataset Loading & Preprocessing

XDR – Train/Test Overview
• Train shape: (9561, 34) | Test shape: (818, 34)
• Total train samples: 9,561 | Total test samples: 818
• Number of features: 32
• Target column: 'label'
• Missing values (train): 0 | (test): 0
XDR – Train Class Distribution
• 0: 8,704
• 1: 857
• Class balance (minority/majority): 9.8460%
XDR – Feature Preparation
• Target encoding: {0: 0, 1: 1}
• Data preprocessing: Infinite values handled, missing values filled with train medians
• Feature scaling: StandardScaler (fit on train, applied to test)
Baseline (Most-Frequent) Accuracy: 0.9095

XDR: Model Performance Comparison

XDR – Model Performance Metrics

ModelAccuracyBalanced AccPrecisionRecallF1ROC-AUCPR-AUC
Logistic Regression0.52320.54320.10500.56760.17720.59770.1980
Random Forest (SMOTE)0.90460.56420.42310.14860.22000.73420.3067
LightGBM0.90220.59940.42500.22970.29820.84700.3763
Balanced RF0.86800.68400.33330.45950.38640.82630.2972
SGD SVM0.11980.50400.09110.97300.1667nannan
IsolationForest0.83010.55370.16490.21620.1871nannan

Confusion Matrix Analysis

ModelTNFPFNTPFP RateMiss Rate
Logistic Regression386358324248.12%43.24%
Random Forest (SMOTE)7291563112.02%85.14%
LightGBM7212357173.09%77.03%
Balanced RF6766840349.14%54.05%
SGD SVM2671827296.51%2.70%
IsolationForest66381581610.89%78.38%

Best Models by Metric

Accuracy
Random Forest (SMOTE)
0.9046
Balanced Acc
Balanced RF
0.6840
Precision
LightGBM
0.4250
Recall
SGD SVM
0.9730
F1
Balanced RF
0.3864
ROC-AUC
LightGBM
0.8470
PR-AUC
LightGBM
0.3763
Lowest False Positive Rate
Random Forest (SMOTE)
2.02%
Lowest Miss Rate
SGD SVM
2.70%

XDR – Metrics by Model

XDR – Metrics by Model

XDR – ROC Curves

XDR – ROC Curves

XDR – Precision–Recall Curves

XDR – Precision–Recall Curves

XDR – Predicted Probability Distributions

XDR – Predicted Probability Distributions

XDR – Threshold Sweep

XDR – Threshold Sweep

XDR: Logistic Regression – Detailed Analysis

XDR – Logistic Regression: Confusion Matrix

XDR – Logistic Regression: Confusion Matrix

XDR – Logistic Regression: Confusion Matrix

XDR – Logistic Regression: Classification Report

Modelprecisionrecallf1support
00.92340.51880.6644744.0000
10.10500.56760.177274.0000
accuracynannan0.5232818.0000

XDR – Logistic Regression: Feature Importance

XDR – Logistic Regression: Feature Importance

XDR – Logistic Regression: Feature Importance

XDR: Random Forest (SMOTE) – Detailed Analysis

XDR – Random Forest (SMOTE): Confusion Matrix

XDR – Random Forest (SMOTE): Confusion Matrix

XDR – Random Forest (SMOTE): Confusion Matrix

XDR – Random Forest (SMOTE): Classification Report

Modelprecisionrecallf1support
00.92050.97980.9492744.0000
10.42310.14860.220074.0000
accuracynannan0.9046818.0000

XDR – Random Forest (SMOTE): Feature Importance

XDR – Random Forest (SMOTE): Feature Importance

XDR – Random Forest (SMOTE): Feature Importance

XDR: LightGBM – Detailed Analysis

XDR – LightGBM: Confusion Matrix

XDR – LightGBM: Confusion Matrix

XDR – LightGBM: Confusion Matrix

XDR – LightGBM: Classification Report

Modelprecisionrecallf1support
00.92670.96910.9474744.0000
10.42500.22970.298274.0000
accuracynannan0.9022818.0000

XDR – LightGBM: Feature Importance

XDR – LightGBM: Feature Importance

XDR – LightGBM: Feature Importance

XDR: Balanced RF – Detailed Analysis

XDR – Balanced RF: Confusion Matrix

XDR – Balanced RF: Confusion Matrix

XDR – Balanced RF: Confusion Matrix

XDR – Balanced RF: Classification Report

Modelprecisionrecallf1support
00.94410.90860.9260744.0000
10.33330.45950.386474.0000
accuracynannan0.8680818.0000

XDR – Balanced RF: Feature Importance

XDR – Balanced RF: Feature Importance

XDR – Balanced RF: Feature Importance

XDR: SGD SVM – Detailed Analysis

XDR – SGD SVM: Confusion Matrix

XDR – SGD SVM: Confusion Matrix

XDR – SGD SVM: Confusion Matrix

XDR – SGD SVM: Classification Report

Modelprecisionrecallf1support
00.92860.03490.0674744.0000
10.09110.97300.166774.0000
accuracynannan0.1198818.0000

XDR – SGD SVM: Feature Importance

XDR – SGD SVM: Feature Importance

XDR – SGD SVM: Feature Importance

XDR: IsolationForest – Detailed Analysis

XDR – IsolationForest: Confusion Matrix

XDR – IsolationForest: Confusion Matrix

XDR – IsolationForest: Confusion Matrix

XDR – IsolationForest: Classification Report

Modelprecisionrecallf1support
00.91960.89110.9051744.0000
10.16490.21620.187174.0000
accuracynannan0.8301818.0000

XDR – IsolationForest: Feature Importance

Feature importance not available for this model type.